The first session will focus on reading in the data after steinbock segmentation (see Session 2).

Read in the data

We use the imcRtools package to read in single-cell data extracted using the steinbock framework. During image processing you will also obtain multi-channel images and segmentation masks that can be read into R using the cytomapper package.

library(imcRtools)
library(cytomapper)

Read in single-cell information

For single-cell data analysis in R the framework. It allows standardized access to (i) expression data, (ii) cellular metadata (e.g. cell type), (iii) feature metadata (e.g. marker name) and (iv) experiment-wide metadata. For an in-depth introduction to the SingleCellExperiment container, please refer to the SingleCellExperiment class.

The SpatialExperiment class is an extension of the SingleCellExperiment class. It was developed to store spatial data in addition to single-cell data and an extended introduction is accessible here.

To read in single-cell data generated by the steinbock framework the imcRtools package provides the read_steinbock function. By default, the data is read into a SpatialExperiment object; however, data can be read in as a SingleCellExperiment object by setting return_as = "sce". All functions presented in this tutorial are applicable to both data containers.

steinbock generated data

The downloaded example data (see data) processed with the steinbock framework can be read in with the read_steinbock function provided by imcRtools. For more information, please refer to ?read_steinbock.

spe <- read_steinbock("data/steinbock/")
spe
## class: SpatialExperiment 
## dim: 40 47859 
## metadata(0):
## assays(1): counts
## rownames(40): MPO HistoneH3 ... DNA1 DNA2
## rowData names(11): channel name ... Final.Concentration...Dilution
##   uL.to.add
## colnames: NULL
## colData names(8): sample_id ObjectNumber ... width_px height_px
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : Pos_X Pos_Y
## imgData names(1): sample_id

By default, single-cell data is read in as SpatialExperiment object. The summarized pixel intensities per channel and cell (here mean intensity) are stored in the counts slot. Columns represent cells and rows represent channels.

counts(spe)[1:5,1:5]
##                [,1]       [,2]      [,3]      [,4]      [,5]
## MPO       0.5751064  0.4166667 0.4975494  0.890154 0.1818182
## HistoneH3 3.1273082 11.3597883 2.3841440  7.712961 1.4512715
## SMA       0.2600939  1.6720383 0.1535190  1.193948 0.2986703
## CD16      2.0347747  2.5880536 2.2943074 15.629083 0.6084220
## CD38      0.2530137  0.6826669 1.1902979  2.126060 0.2917793

Metadata associated to individual cells are stored in the colData slot. After initial image processing, these metadata include the numeric identifier (ObjectNumber), the area, and morphological features of each cell. In addition, sample_id stores the image name from which each cell was extracted and the width and height of the corresponding images are stored.

head(colData(spe))
## DataFrame with 6 rows and 8 columns
##      sample_id ObjectNumber      area axis_major_length axis_minor_length
##    <character>    <numeric> <numeric>         <numeric>         <numeric>
## 1 Patient1_001            1        12           7.40623           1.89529
## 2 Patient1_001            2        24          16.48004           1.96284
## 3 Patient1_001            3        17           9.85085           1.98582
## 4 Patient1_001            4        24           8.08290           3.91578
## 5 Patient1_001            5        22           8.79367           3.11653
## 6 Patient1_001            6        25           9.17436           3.46929
##   eccentricity  width_px height_px
##      <numeric> <numeric> <numeric>
## 1     0.966702       600       600
## 2     0.992882       600       600
## 3     0.979470       600       600
## 4     0.874818       600       600
## 5     0.935091       600       600
## 6     0.925744       600       600

The main difference between the SpatialExperiment and the SingleCellExperiment data container in the current setting is the way spatial locations of all cells are stored. For the SingleCellExperiment container, the locations are stored in the colData slot while the SpatialExperiment container stores them in the spatialCoords slot:

head(spatialCoords(spe))
##      Pos_X     Pos_Y
## 1 468.5833 0.4166667
## 2 515.8333 0.4166667
## 3 587.2353 0.4705882
## 4 192.2500 1.2500000
## 5 231.7727 0.9090909
## 6 270.1600 1.0400000

The spatial cell graphs generated by steinbock are read into a colPair slot of the SpatialExperiment (or SingleCellExperiment) object. Cell-cell interactions (cells in close spatial proximity) are represented as “edge list” (stored as SelfHits object). Here, the left side represents the column indices of the “from” cells and the right side represents the column indices of the “to” cells. We will later see how to visualize the spatial cell graphs.

colPair(spe, "neighborhood")
## SelfHits object with 257116 hits and 0 metadata columns:
##                 from        to
##            <integer> <integer>
##        [1]         1        27
##        [2]         1        55
##        [3]         2        10
##        [4]         2        44
##        [5]         2        81
##        ...       ...       ...
##   [257112]     47858     47836
##   [257113]     47859     47792
##   [257114]     47859     47819
##   [257115]     47859     47828
##   [257116]     47859     47854
##   -------
##   nnode: 47859

Finally, metadata regarding the channels are stored in the rowData slot. This information is extracted from the panel.csv file. Channels are ordered by isotope mass and therefore match the channel order of the multi-channel images.

head(rowData(spe))
## DataFrame with 6 rows and 11 columns
##               channel        name      keep   ilastik  deepcell Tube.Number
##           <character> <character> <numeric> <numeric> <numeric>   <numeric>
## MPO               Y89         MPO         1        NA        NA        2101
## HistoneH3       In113   HistoneH3         1         1         1        2113
## SMA             In115         SMA         1        NA        NA        1914
## CD16            Pr141        CD16         1        NA        NA        2079
## CD38            Nd142        CD38         1        NA        NA        2095
## HLADR           Nd143       HLADR         1        NA        NA        2087
##                        Target Antibody.Clone Stock.Concentration
##                   <character>    <character>           <numeric>
## MPO       Myeloperoxidase MPO Polyclonal MPO                 500
## HistoneH3          Histone H3           D1H2                 500
## SMA                       SMA            1A4                 500
## CD16                     CD16       EPR16784                 500
## CD38                     CD38        EPR4106                 500
## HLADR                  HLA-DR        TAL 1B5                 500
##           Final.Concentration...Dilution   uL.to.add
##                              <character> <character>
## MPO                              4 ug/mL         0.8
## HistoneH3                        1 ug/mL         0.2
## SMA                           0.25 ug/mL        0.05
## CD16                             5 ug/mL           1
## CD38                           2.5 ug/mL         0.5
## HLADR                            1 ug/mL         0.2

Reading custom files

When not using steinbock, the single-cell information has to be read in from custom files. We now demonstrate how to generate a SpatialExperiment object from single-cell data contained in individual files. As an example, we use files generated by steinbock.

First we will read in the single-cell features stored in a CSV file:

library(readr)

cur_intensities <- read_csv("data/steinbock/intensities/Patient1_001.csv")
cur_regionprobs <- read_csv("data/steinbock/regionprops/Patient1_001.csv")

dim(cur_intensities)
## [1] 3567   41
dim(cur_regionprobs)
## [1] 3567    7
colnames(cur_intensities)
##  [1] "Object"            "MPO"               "HistoneH3"        
##  [4] "SMA"               "CD16"              "CD38"             
##  [7] "HLADR"             "CD27"              "CD15"             
## [10] "CD45RA"            "CD163"             "B2M"              
## [13] "CD20"              "CD68"              "Ido1"             
## [16] "CD3"               "LAG3 / LAG33"      "CD11c"            
## [19] "PD1"               "PDGFRb"            "CD7"              
## [22] "GrzB"              "PDL1"              "TCF7"             
## [25] "CD45RO"            "FOXP3"             "ICOS"             
## [28] "CD8a"              "CarbonicAnhydrase" "CD33"             
## [31] "Ki67"              "VISTA"             "CD40"             
## [34] "CD4"               "CD14"              "Ecad"             
## [37] "CD303"             "CD206"             "cleavedPARP"      
## [40] "DNA1"              "DNA2"
colnames(cur_regionprobs)
## [1] "Object"            "area"              "centroid-0"       
## [4] "centroid-1"        "axis_major_length" "axis_minor_length"
## [7] "eccentricity"

These files contain single-cell features including the cell identifier (Object), morphological features, the cells’ locations (centroid-0 and centroid-1) and the mean pixel intensity per cell and per channel (cur_intensities).

Now we will extract the relevant entries from the files.

counts <- cur_intensities[,-1]

meta <- cur_regionprobs[,c("Object", "area", "axis_major_length", "axis_minor_length")]

coords <- cur_regionprobs[,c("centroid-1", "centroid-0")]

From these features we can now construct the SpatialExperiment object.

library(SpatialExperiment)
spe2 <- SpatialExperiment(assays = list(counts = t(counts)),
                          colData = meta, 
                          sample_id = "Patient1_001",
                          spatialCoords = as.matrix(coords))

Next, we can store the spatial cell graph generated by steinbock in the colPairs slot of the object. Spatial cell graphs are usually stored as edge list in form of a CSV file. The colPairs slot requires a SelfHits entry storing an edge list where numeric entries represent the index of the from and to cell in the SpatialExperiment object. To generate such an edge list, we need to match the cell IDs contained in the CSV against the cell IDs in the SpatialExperiment object.

cur_pairs <- read_csv("data/steinbock/neighbors/Patient1_001.csv")

edgelist <- SelfHits(from = match(cur_pairs$Object, spe2$Object),
                     to = match(cur_pairs$Neighbor, spe2$Object),
                     nnode = ncol(spe2))

colPair(spe2, "neighborhood") <- edgelist

For further downstream analysis, we will use the steinbock results as read in above.

Single-cell processing

After reading in the single-cell data, few further processing steps need to be taken.

Add additional metadata

We can set the colnames of the object to generate unique identifiers per cell:

colnames(spe) <- paste0(spe$sample_id, "_", spe$ObjectNumber)

It is also often the case that sample-specific metadata are available externally. For the current data, we need to link the cancer type (also referred to as “Indication”) to each sample. This metadata is available as external excel file:

library(openxlsx)
library(stringr)
meta <- read.xlsx("data/steinbock/sample_metadata.xlsx")

spe$patient_id <- as.vector(str_extract_all(spe$sample_id, "Patient[1-4]", simplify = TRUE))
spe$ROI <- as.vector(str_extract_all(spe$sample_id, "00[1-8]", simplify = TRUE))
spe$indication <- meta$Indication[match(spe$patient_id, meta$Sample.ID)]

unique(spe$indication)
## [1] "SCCHN" "BCC"   "NSCLC" "CRC"

The selected patients were diagnosed with different cancer types:

  • SCCHN - head and neck cancer
  • BCC - breast cancer
  • NSCLC - lung cancer
  • CRC - colorectal cancer

Transform counts

The distribution of expression counts across cells is often observed to be skewed towards the right side meaning lots of cells display low counts and few cells have high counts. To avoid analysis biases from these high-expressing cells, the expression counts are commonly transformed or clipped.

Here, we perform counts transformation using an inverse hyperbolic sine function. This transformation is commonly applied to flow cytometry data. The cofactor here defines the expression range on which no scaling is performed. While the cofactor for CyTOF data is often set to 5, IMC data usually display much lower counts. We therefore apply a cofactor of 1.

However, other transformations such as log(counts(spe) + 0.01) should be tested when analysing IMC data.

library(dittoSeq)
dittoRidgePlot(spe, var = "CD3", group.by = "patient_id", assay = "counts") +
    ggtitle("CD3 - before transformation")

assay(spe, "exprs") <- asinh(counts(spe)/1)
dittoRidgePlot(spe, var = "CD3", group.by = "patient_id", assay = "exprs") +
    ggtitle("CD3 - after transformation")

Define interesting channels

For downstream analysis such as visualization, dimensionality reduction and clustering, only a subset of markers should be used. As convenience, we can store an additional entry in the rowData slot that specifies the markers of interest. Here, we deselect the nuclear markers, which were primarily used for cell segmentation, and keep all other biological targets.

rowData(spe)$use_channel <- !grepl("DNA|Histone", rownames(spe))

Define color schemes

We will define color schemes for different metadata entries of the data and conveniently store them in the metadata slot of the SpatialExperiment which will be helpful for downstream data visualizations. We will use colors from the RColorBrewer and dittoSeq package but any other coloring package will suffice.

library(RColorBrewer)
color_vectors <- list()

ROI <- setNames(brewer.pal(length(unique(spe$ROI)), name = "BrBG"), 
                unique(spe$ROI))
patient_id <- setNames(brewer.pal(length(unique(spe$patient_id)), name = "Set1"), 
                unique(spe$patient_id))
sample_id <- setNames(c(brewer.pal(6, "YlOrRd")[3:5],
                        brewer.pal(6, "PuBu")[3:6],
                        brewer.pal(6, "YlGn")[3:5],
                        brewer.pal(6, "BuPu")[3:6]),
                unique(spe$sample_id))
indication <- setNames(brewer.pal(length(unique(spe$indication)), name = "Set2"), 
                unique(spe$indication))

color_vectors$ROI <- ROI
color_vectors$patient_id <- patient_id
color_vectors$sample_id <- sample_id
color_vectors$indication <- indication

metadata(spe)$color_vectors <- color_vectors

Read in images

The cytomapper package allows multi-channel image handling and visualization within the Bioconductor framework. The most common data format for multi-channel images or segmentation masks is the TIFF file format, which is used by steinbock.

Here, we will read in multi-channel images and segmentation masks into a CytoImageList data container. It allows storing multiple multi-channel images and requires matched channels across all images within the object.

The loadImages function is used to read in processed multi-channel images and their corresponding segmentation masks. Of note, the multi-channel images generated by steinbock are saved as 32-bit images while the segmentation masks are saved as 16-bit images. To correctly scale pixel values of the segmentation masks when reading them in set as.is = TRUE.

images <- loadImages("data/steinbock/img/")
## All files in the provided location will be read in.
masks <- loadImages("data/steinbock/masks/", as.is = TRUE)
## All files in the provided location will be read in.

In the case of multi-channel images, it is beneficial to set the channelNames for easy visualization. Using the steinbock framework, the channel order of the single-cell data matches the channel order of the multi-channel images. However, it is recommended to make sure that the channel order is identical between the single-cell data and the images.

channelNames(images) <- rownames(spe)
images
## CytoImageList containing 14 image(s)
## names(14): Patient1_001 Patient1_002 Patient1_003 Patient2_001 Patient2_002 Patient2_003 Patient2_004 Patient3_001 Patient3_002 Patient3_003 Patient4_005 Patient4_006 Patient4_007 Patient4_008 
## Each image contains 40 channel(s)
## channelNames(40): MPO HistoneH3 SMA CD16 CD38 HLADR CD27 CD15 CD45RA CD163 B2M CD20 CD68 Ido1 CD3 LAG3 / LAG33 CD11c PD1 PDGFRb CD7 GrzB PDL1 TCF7 CD45RO FOXP3 ICOS CD8a CarbonicAnhydrase CD33 Ki67 VISTA CD40 CD4 CD14 Ecad CD303 CD206 cleavedPARP DNA1 DNA2

For image and mask visualization we will need to add additional metadata to the elementMetadata slot of the CytoImageList objects. This slot is easily accessible using the mcols function.

Here, we will save the matched sample_id, patient_id and indication information within the elementMetadata slot of the multi-channel images and segmentation masks objects. It is crucial that the order of the images in both CytoImageList objects is the same.

all.equal(names(images), names(masks))
## [1] TRUE
patient_id <- str_extract_all(names(images), "Patient[1-4]", simplify = TRUE)
indication <- meta$Indication[match(patient_id, meta$Sample.ID)] 

mcols(images) <- mcols(masks) <- DataFrame(sample_id = names(images),
                                           patient_id = patient_id,
                                           indication = indication)

Generate single-cell data from images

An alternative way of generating a SingleCellExperiment object directly from the multi-channel images and segmentation masks is supported by the measureObjects function of the cytomapper package. For each cell present in the masks object, the function computes the mean pixel intensity per channel as well as morphological features (area, radius, major axis length, eccentricity) and the location of cells:

cytomapper_sce <- measureObjects(masks, image = images, img_id = "sample_id")

cytomapper_sce
## class: SingleCellExperiment 
## dim: 40 47859 
## metadata(0):
## assays(1): counts
## rownames(40): MPO HistoneH3 ... DNA1 DNA2
## rowData names(0):
## colnames: NULL
## colData names(10): sample_id object_id ... patient_id.V1 indication
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

Single-cell visualization

The following section focuses on visualizing the single-cell data contained in the SpatialExperiment object. The main R/Bioconductor packages to support visualization are dittoSeq and imcRtools

Dimensionality reduction

First, we will use non-linear dimensionality reduction methods to project cells from a high-dimensional (40) down to a low-dimensional (2) space. For this the scater package provides the runUMAP and runTSNE function. To ensure reproducibility, we will need to set a seed; however different seeds and different parameter settings (e.g. the perplexity) parameter in the runTSNE function need to be tested to avoid interpreting visualization artefacts. For dimensionality reduction, we will use all channels that show biological variation across the dataset. However, marker selection can be performed with different biological questions in mind.

library(scater)

set.seed(220225)
spe <- runUMAP(spe, subset_row = rowData(spe)$use_channel, exprs_values = "exprs") 
spe <- runTSNE(spe, subset_row = rowData(spe)$use_channel, exprs_values = "exprs") 

After dimensionality reduction, the low-dimensional embeddings are stored in the reducedDim slot.

reducedDims(spe)
## List of length 2
## names(2): UMAP TSNE
head(reducedDim(spe, "UMAP"))
##                    [,1]      [,2]
## Patient1_001_1 4.736075 -3.418426
## Patient1_001_2 4.481811 -3.180779
## Patient1_001_3 4.482297 -3.166900
## Patient1_001_4 4.227500 -2.845877
## Patient1_001_5 6.190731 -2.254569
## Patient1_001_6 5.531091 -3.464475

Visualization of the low-dimensional embedding facilitates assessment of potential “batch effects”. The dittoDimPlot function allows flexible visualization. It returns ggplot objects which can be further modified.

First, we will visualize single-cell metadata such as the patient ID and the cancer type.

library(patchwork)
library(dittoSeq)
library(viridis)

# visualize patient id 
p1 <- dittoDimPlot(spe, var = "patient_id", reduction.use = "UMAP", size = 0.2) + 
    scale_color_manual(values = metadata(spe)$color_vectors$patient_id) +
    ggtitle("Patient ID on UMAP")
p2 <- dittoDimPlot(spe, var = "patient_id", reduction.use = "TSNE", size = 0.2) + 
    scale_color_manual(values = metadata(spe)$color_vectors$patient_id) +
    ggtitle("Patient ID on TSNE")

# visualize indication
p3 <- dittoDimPlot(spe, var = "indication", reduction.use = "UMAP", size = 0.2) + 
    scale_color_manual(values = metadata(spe)$color_vectors$indication) +
    ggtitle("Indication on UMAP")
p4 <- dittoDimPlot(spe, var = "indication", reduction.use = "TSNE", size = 0.2) + 
    scale_color_manual(values = metadata(spe)$color_vectors$indication) +
    ggtitle("Indication on TSNE")

(p1 + p2) / (p3 + p4)

Next, we can visualize marker expression on the UMAP embedding.

# visualize marker expression
p1 <- dittoDimPlot(spe, var = "Ecad", reduction.use = "UMAP", 
                   assay = "exprs", size = 0.2) +
    scale_color_viridis(name = "Ecad") +
    ggtitle("E-Cadherin expression on UMAP")
p2 <- dittoDimPlot(spe, var = "CD45RO", reduction.use = "UMAP", 
                   assay = "exprs", size = 0.2) +
    scale_color_viridis(name = "CD45RO") +
    ggtitle("CD45RO expression on UMAP")
p3 <- dittoDimPlot(spe, var = "CD20", reduction.use = "UMAP", 
                   assay = "exprs", size = 0.2) +
    scale_color_viridis(name = "CD20") +
    ggtitle("CD20 expression on UMAP")
p4 <- dittoDimPlot(spe, var = "CD3", reduction.use = "UMAP", 
                   assay = "exprs", size = 0.2) +
    scale_color_viridis(name = "CD3") +
    ggtitle("CD3 expression on UMAP")

(p1 + p2) / (p3 + p4)

We observe a strong separation of tumor cells (Ecad+ cells) between the patients. Here, each patient was diagnosed with a different tumor type. The separation of tumor cells could be of biological origin since tumor cells tend to display differences in expression between patients and cancer types and/or of technical origin: the panel only contains a single tumor marker (E-Cadherin) and therefore slight technical differences in staining causes visible separation between cells of different patients. Nevertheless, the immune compartment (CD45RO+ cells) mix between patients and we can rule out systematic staining differences between patients.

Visualizing marker expression

This section focuses on visualizing the expression of all markers and highlighting variation between cells, images and patients.

Per cell

First, we will visualize single-cell marker expression in form of a heatmap. Here, we sub-sample the dataset to 2000 cells for visualization purposes and overlay the cancer type from which the cells were extracted.

cur_cells <- sample(seq_len(ncol(spe)), 2000)

dittoHeatmap(spe[,cur_cells], genes = rownames(spe)[rowData(spe)$use_channel],
             assay = "exprs", cluster_cols = TRUE, scale = "none",
             heatmap.colors = viridis(100), annot.by = "indication",
             annotation_colors = list(indication = metadata(spe)$color_vectors$indication))

We can differentiate between epithelial cells (Ecad+) and immune cells (CD45RO). Some of the markers are specifically detected (e.g., Ki67, CD20, Ecad) and others are more broadly detected (e.g. HLADR, B2M, CD4).

Per image

It can be beneficial to visualize the mean marker expression per image to identify images with outlying marker expression. This check does not indicate image quality per se but can highlight biological differences. Here, we will use the aggregateAcrossCells function of the scuttle package to compute the mean expression per image. For visualization purposes, we again asinh transform the mean expression values.

library(scuttle)

image_mean <- aggregateAcrossCells(spe, 
                                   ids = spe$sample_id, 
                                   statistics="mean",
                                   use.assay.type = "counts")
assay(image_mean, "exprs") <- asinh(counts(image_mean))

dittoHeatmap(image_mean, genes = rownames(spe)[rowData(spe)$use_channel],
             assay = "exprs", cluster_cols = TRUE, scale = "none",
             heatmap.colors = viridis(100), 
             annot.by = c("indication", "patient_id", "ROI"),
             annotation_colors = list(indication = metadata(spe)$color_vectors$indication,
                                      patient_id = metadata(spe)$color_vectors$patient_id,
                                      ROI = metadata(spe)$color_vectors$ROI),
             show_colnames = TRUE)

Per patient

The data presented here originate from samples from different hospitals with potential differences in pre-processing and each sample stained individually. These (and other) technical aspects can induce staining differences between samples or batches of samples. In addition, patients were diagnosed with different cancer types. We will use ridgeline visualizations to check differences in staining patterns and biological differences in expression:

multi_dittoPlot(spe, vars = rownames(spe)[rowData(spe)$use_channel],
               group.by = "patient_id", plots = "ridgeplot", 
               assay = "exprs", 
               color.panel = metadata(spe)$color_vectors$patient_id)

We observe variations in the distributions of marker expression across patients. These variations may arise partly from different abundances of cells in different images (e.g. Patient3 may have higher numbers of CD11c+ and PD1+ cells) as well as staining differences between samples. While most of the selected markers are specifically expressed in immune cell subtypes, we can see that E-Cadherin (a marker for epithelial (tumor) cells) shows similar expression across all patients.

Image area covered by cells

A quality indicator for region selection is the image area covered by cells (or biological tissue). This metric identifies regions of interest (ROIs) where little cells are present, possibly hinting at incorrect selection of the ROI. We can compute the percentage of covered image area using the metadata contained in the SpatialExperiment object:

library(dplyr)

colData(spe) %>%
    as.data.frame() %>%
    group_by(sample_id) %>%
    summarize(cell_area = sum(area),
           no_pixels = mean(width_px) * mean(height_px)) %>%
    mutate(covered_area = cell_area / no_pixels) %>%
    ggplot() +
        geom_point(aes(reorder(sample_id,covered_area), covered_area)) + 
        theme_minimal(base_size = 15) +
        ylim(c(0, 1)) + 
        theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8)) +
        ylab("% covered area") + xlab("")

Cell size

Next, we observe the distributions of cell size across the individual images. Differences in cell size distributions can indicate segmentation biases due to differences in cell density or can indicate biological differences due to cell type compositions (tumor cells tend to be larger than immune cells).

colData(spe) %>%
    as.data.frame() %>%
    group_by(sample_id) %>%
    ggplot() +
        geom_boxplot(aes(sample_id, area)) +
        theme_minimal(base_size = 15) + 
        theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8)) +
        ylab("Cell area") + xlab("")

summary(spe$area)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00   47.00   70.00   76.38   98.00  466.00

The median cell size is 70 pixels with a median major axis length of 11.3. The largest cell has an area of 466 pixels which relates to a diameter of 21.6 pixels assuming a circular shape. Overall, the distribution of cell sizes is similar across images with images from Patient4_005 and Patient4_007 showing a reduced average cell size. These images contain fewer tumor cells which can explain the smaller average cell size.

We detect very small cells in the dataset and will remove them. The chosen threshold is arbitrary and needs to be adjusted per dataset.

sum(spe$area < 5)
## [1] 65
spe <- spe[,spe$area >= 5]

Spatial visualization

Here, we introduce the plotSpatial function of the imcRtools package to visualize the cells’ centroids and cell-cell interactions as spatial graphs.

In the following example, we select images of one patient for visualization purposes. Here, each dot (node) represents a cell and edges are drawn between cells in close physical proximity as detected by steinbock. Nodes are variably colored based on the cells’ expression level of E-cadherin and their size can be adjusted (e.g., based on measured cell area).

library(ggplot2)
library(viridis)

# steinbock interaction graph 
plotSpatial(spe[,grepl("Patient3", spe$sample_id)], 
            node_color_by = "Ecad", 
            assay_type = "exprs",
            img_id = "sample_id", 
            draw_edges = TRUE, 
            colPairName = "neighborhood", 
            nodes_first = FALSE, 
            node_size_by = "area", 
            directed = FALSE,
            edge_color_fix = "grey") + 
    scale_size_continuous(range = c(0.1, 2)) +
    ggtitle("E-cadherin expression")

Image visualization

The following section highlights the use of the cytomapper package to visualize multichannel images and segmentation masks. For visualization purposes we select 3 images from the dataset:

# Sample images
set.seed(220517)
cur_id <- sample(unique(spe$sample_id), 3)
cur_images <- images[names(images) %in% cur_id]
cur_masks <- masks[names(masks) %in% cur_id]

Pixel visualization

The following section gives examples for visualizing individual channels or multiple channels as pseudo-color composite images. For this the cytomapper package exports the plotPixels function which expects a CytoImageList object storing one or multiple multi-channel images. In the simplest use case, a single channel can be visualized as follows:

plotPixels(cur_images, 
           colour_by = "Ecad",
           bcg = list(Ecad = c(0, 5, 1)))

The plot above shows the tissue expression of the epithelial tumor marker E-cadherin on the 3 selected images. The bcg parameter (default c(0, 1, 1)) stands for “background”, “contrast”, “gamma” and controls these attributes of the image. This parameter takes a named list where each entry specifies these attributes per channel. The first value of the numeric vector will be added to the pixel intensities (background); pixel intensities will be multiplied by the second entry of the vector (contrast); pixel intensities will be exponentiated by the third entry of the vector (gamma). In most cases, it is sufficient to adjust the second (contrast) entry of the vector.

The following example highlights the visualization of 6 markers (maximum allowed number of markers) at once per image. The markers indicate the spatial distribution of tumor cells (E-cadherin), T cells (CD3), B cells (CD20), CD8+ T cells (CD8a), plasma cells (CD38) and proliferating cells (Ki67).

plotPixels(cur_images, 
           colour_by = c("Ecad", "CD3", "CD20", "CD8a", "CD38", "Ki67"),
           bcg = list(Ecad = c(0, 5, 1),
                      CD3 = c(0, 5, 1),
                      CD20 = c(0, 5, 1),
                      CD8a = c(0, 5, 1),
                      CD38 = c(0, 8, 1),
                      Ki67 = c(0, 5, 1)))

Adjusting colors

The default colors for visualization are chosen by the additive RGB (red, green, blue) color model. For six markers the default colors are: red, green, blue, cyan (green + blue), magenta (red + blue), yellow (green + red). These colors are the easiest to distinguish by eye. However, you can select other colors for each channel by setting the colour parameter:

plotPixels(cur_images, 
           colour_by = c("Ecad", "CD3", "CD20"),
           bcg = list(Ecad = c(0, 5, 1),
                      CD3 = c(0, 5, 1),
                      CD20 = c(0, 5, 1)),
           colour = list(Ecad = c("black", "burlywood1"),
                         CD3 = c("black", "cyan2"),
                         CD20 = c("black", "firebrick1")))

The colour parameter takes a named list in which each entry specifies the colors from which a color gradient is constructed via colorRampPalette. These are usually vectors of length 2 in which the first entry is "black" and the second entry specifies the color of choice. Although not recommended, you can also specify more than two colors to generate a more complex color gradient.

Image normalization

As an alternative to setting the bcg parameter, images can first be normalized. Normalization here means to scale the pixel intensities per channel between 0 and 1 (or a range specified by the ft parameter in the normalize function). By default, the normalize function scales pixel intensities across all images contained in the CytoImageList object (separateImages = FALSE). Each individual channel is scaled independently (separateChannels = TRUE).

After 0-1 normalization, maximum pixel intensities can be clipped to enhance the contrast of the image (setting the inputRange parameter). In the following example, the clipping to 0 and 0.2 is the same as multiplying the pixel intensities by a factor of 5.

# 0 - 1 channel scaling across all images
norm_images <- normalize(cur_images)

# Clip channel at 0.2
norm_images <- normalize(norm_images, inputRange = c(0, 0.2))

plotPixels(norm_images, 
           colour_by = c("Ecad", "CD3", "CD20", "CD8a", "CD38", "Ki67"))

The default setting of scaling pixel intensities across all images ensures comparable intensity levels across images. Pixel intensities can also be scaled per image therefore correcting for staining/expression differences between images:

# 0 - 1 channel scaling per image
norm_images <- normalize(cur_images, separateImages = TRUE)

# Clip channel at 0.2
norm_images <- normalize(norm_images, inputRange = c(0, 0.2))

plotPixels(norm_images, 
           colour_by = c("Ecad", "CD3", "CD20", "CD8a", "CD38", "Ki67"))

As we can see, the marker Ki67 appears brighter on image 2 and 3 in comparison to scaling the channel across all images.

Finally, the normalize function also accepts a named list input for the inputRange argument. In this list, the clipping range per channel can be set individually:

# 0 - 1 channel scaling per image
norm_images <- normalize(cur_images, 
                         separateImages = TRUE,
                         inputRange = list(Ecad = c(0, 50), 
                                           CD3 = c(0, 30),
                                           CD20 = c(0, 40),
                                           CD8a = c(0, 50),
                                           CD38 = c(0, 10),
                                           Ki67 = c(0, 70)))

plotPixels(norm_images, 
           colour_by = c("Ecad", "CD3", "CD20", "CD8a", "CD38", "Ki67"))

Segmentation mask visualization

The cytomapper package provides the plotCells function to visualize the aggregated pixel intensities per cell on segmnetation masks. In the current dataset pixel intensities were aggregated by computing the mean pixel intensity per cell and per channel. The plotCells function accepts the exprs_values argument (default counts) that allows selecting the assay which stores the expression values that should be visualized.

In the following example, we visualize the asinh-transformed mean pixel intensities of the epithelial marker E-cadherin on segmentation masks.

plotCells(cur_masks,
          object = spe, 
          cell_id = "ObjectNumber", img_id = "sample_id",
          colour_by = "Ecad",
          exprs_values = "exprs")

Segmentation quality control

We can now use the plotPixels function to outline segmented cells on image composites to observe segmentation accuracy. Without having ground-truth data readily available, a common approach to segmentation quality control is to overlay segmentation masks on composite images displaying channels that were used for segmentation.

Here, we select 3 random images and perform image- and channel-wise normalization (channels are first min-max normalized and scaled to a range of 0-1 before clipping the maximum intensity to 0.2).

library(cytomapper)
set.seed(20220118)
img_ids <- sample(seq_len(length(images)), 3)

# Normalize and clip images
cur_images <- images[img_ids]
cur_images <- normalize(cur_images, separateImages = TRUE)
cur_images <- normalize(cur_images, inputRange = c(0, 0.2))

plotPixels(cur_images,
           mask = masks[img_ids],
           img_id = "sample_id",
           missing_colour = "white",
           colour_by = c("CD163", "CD20", "CD3", "Ecad", "DNA1"),
           colour = list(CD163 = c("black", "yellow"),
                         CD20 = c("black", "red"),
                         CD3 = c("black", "green"),
                         Ecad = c("black", "cyan"),
                         DNA1 = c("black", "blue")),
           image_title = NULL,
           legend = list(colour_by.title.cex = 0.7,
                         colour_by.labels.cex = 0.7))

We can see that nuclei are centered within the segmentation masks and all cell types are correctly segmented . A common challenge here is to segment large (e.g. epithelial cells - in cyan) versus small (e.g. B cells - in red). However, the segmentation approach here appears to correctly segment cells across different sizes.

Save objects

Finally, the generated data objects can be saved for further downstream processing and analysis.

saveRDS(spe, "data/spe.rds")
saveRDS(images, "data/images.rds")
saveRDS(masks, "data/masks.rds")

Obtaining public IMC datasets

The imcdatasets R/Bioconductor package provides publicly available IMC datasets in form of SingleCellExperiment (single-cell data) and CytoImageList (images and segmentation masks) objects.

library(imcdatasets)

listDatasets()

sce <- Damond_2019_Pancreas(data_type = "sce")
images <- Damond_2019_Pancreas(data_type = "images")
masks <- Damond_2019_Pancreas(data_type = "masks")

Session info

sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] dplyr_1.0.10                viridis_0.6.2              
##  [3] viridisLite_0.4.1           patchwork_1.1.2            
##  [5] scater_1.26.1               scuttle_1.8.3              
##  [7] RColorBrewer_1.1-3          dittoSeq_1.10.0            
##  [9] ggplot2_3.4.0               stringr_1.5.0              
## [11] openxlsx_4.2.5.1            readr_2.1.3                
## [13] cytomapper_1.10.0           EBImage_4.40.0             
## [15] imcRtools_1.4.2             SpatialExperiment_1.8.0    
## [17] SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0
## [19] Biobase_2.58.0              GenomicRanges_1.50.2       
## [21] GenomeInfoDb_1.34.4         IRanges_2.32.0             
## [23] S4Vectors_0.36.1            BiocGenerics_0.44.0        
## [25] MatrixGenerics_1.10.0       matrixStats_0.63.0         
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.2                shinydashboard_0.7.2     
##   [3] R.utils_2.12.2            tidyselect_1.2.0         
##   [5] htmlwidgets_1.6.0         grid_4.2.2               
##   [7] BiocParallel_1.32.5       Rtsne_0.16               
##   [9] DropletUtils_1.18.1       munsell_0.5.0            
##  [11] ScaledMatrix_1.6.0        codetools_0.2-18         
##  [13] units_0.8-1               DT_0.26                  
##  [15] withr_2.5.0               colorspace_2.0-3         
##  [17] highr_0.10                knitr_1.41               
##  [19] rstudioapi_0.14           labeling_0.4.2           
##  [21] GenomeInfoDbData_1.2.9    polyclip_1.10-4          
##  [23] bit64_4.0.5               farver_2.1.1             
##  [25] pheatmap_1.0.12           rhdf5_2.42.0             
##  [27] vctrs_0.5.1               generics_0.1.3           
##  [29] xfun_0.36                 R6_2.5.1                 
##  [31] ggbeeswarm_0.7.1          graphlayouts_0.8.4       
##  [33] rsvd_1.0.5                locfit_1.5-9.6           
##  [35] concaveman_1.1.0          bitops_1.0-7             
##  [37] rhdf5filters_1.10.0       cachem_1.0.6             
##  [39] RTriangle_1.6-0.11        DelayedArray_0.24.0      
##  [41] assertthat_0.2.1          promises_1.2.0.1         
##  [43] scales_1.2.1              vroom_1.6.0              
##  [45] ggraph_2.1.0              beeswarm_0.4.0           
##  [47] gtable_0.3.1              beachmat_2.14.0          
##  [49] tidygraph_1.2.2           rlang_1.0.6              
##  [51] systemfonts_1.0.4         BiocManager_1.30.19      
##  [53] yaml_2.3.6                abind_1.4-5              
##  [55] httpuv_1.6.7              tools_4.2.2              
##  [57] ellipsis_0.3.2            raster_3.6-11            
##  [59] jquerylib_0.1.4           proxy_0.4-27             
##  [61] ggridges_0.5.4            Rcpp_1.0.9               
##  [63] sparseMatrixStats_1.10.0  zlibbioc_1.44.0          
##  [65] classInt_0.4-8            purrr_1.0.0              
##  [67] RCurl_1.98-1.9            cowplot_1.1.1            
##  [69] ggrepel_0.9.2             magrittr_2.0.3           
##  [71] data.table_1.14.6         magick_2.7.3             
##  [73] hms_1.1.2                 mime_0.12                
##  [75] fftwtools_0.9-11          evaluate_0.19            
##  [77] xtable_1.8-4              jpeg_0.1-10              
##  [79] gridExtra_2.3             compiler_4.2.2           
##  [81] tibble_3.1.8              KernSmooth_2.23-20       
##  [83] crayon_1.5.2              R.oo_1.25.0              
##  [85] htmltools_0.5.4           later_1.3.0              
##  [87] tzdb_0.3.0                tiff_0.1-11              
##  [89] tidyr_1.2.1               DBI_1.1.3                
##  [91] tweenr_2.0.2              MASS_7.3-58.1            
##  [93] sf_1.0-9                  BiocStyle_2.26.0         
##  [95] Matrix_1.5-3              cli_3.5.0                
##  [97] R.methodsS3_1.8.2         parallel_4.2.2           
##  [99] igraph_1.3.5              pkgconfig_2.0.3          
## [101] sp_1.5-1                  terra_1.6-47             
## [103] svglite_2.1.0             vipor_0.4.5              
## [105] bslib_0.4.2               dqrng_0.3.0              
## [107] XVector_0.38.0            digest_0.6.31            
## [109] RcppAnnoy_0.0.20          rmarkdown_2.19           
## [111] uwot_0.1.14               edgeR_3.40.1             
## [113] distances_0.1.8           DelayedMatrixStats_1.20.0
## [115] shiny_1.7.4               rjson_0.2.21             
## [117] lifecycle_1.0.3           jsonlite_1.8.4           
## [119] Rhdf5lib_1.20.0           BiocNeighbors_1.16.0     
## [121] limma_3.54.0              fansi_1.0.3              
## [123] pillar_1.8.1              lattice_0.20-45          
## [125] fastmap_1.1.0             glue_1.6.2               
## [127] zip_2.2.2                 png_0.1-8                
## [129] svgPanZoom_0.3.4          bit_4.0.5                
## [131] ggforce_0.4.1             class_7.3-20             
## [133] stringi_1.7.8             sass_0.4.4               
## [135] HDF5Array_1.26.0          nnls_1.4                 
## [137] BiocSingular_1.14.0       irlba_2.3.5.1            
## [139] e1071_1.7-12